-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: evaluation ingestion (no user-facing feature is added) #1764
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niiice! Left some comments but you walked me through it so will give others a chance to take a look before stamping. Ping me tomorrow.
label: String | ||
explanation: String | ||
spanId: String! | ||
documentPosition: Int! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docs: Adding descriptions for the fields would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
app/schema.graphql
Outdated
score: Float | ||
label: String | ||
explanation: String | ||
spanId: String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to expose the spanId here? If so this becomes a bit less generic. Totally a nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I was just copy/pasting what I had in mind for the Python code.
app/schema.graphql
Outdated
score: Float | ||
label: String | ||
explanation: String | ||
spanId: String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - maybe this was for troubleshooting but not needed I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. will remove. thx
|
||
@strawberry.type | ||
class SpanEvaluation(Evaluation): | ||
span_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can mark this as private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
src/phoenix/server/api/types/Span.py
Outdated
@@ -122,6 +124,14 @@ class Span: | |||
description="Cumulative (completion) token count from self and all " | |||
"descendant spans (children, grandchildren, etc.)", | |||
) | |||
span_evaluations: List[SpanEvaluation] = strawberry.field( | |||
description="Span evaluations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, no need to repeat the name - best to be more verbose and informative in the descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that makes sense
src/phoenix/server/api/types/Span.py
Outdated
span_evaluations: List[SpanEvaluation] = [] | ||
document_evaluations: List[DocumentEvaluation] = [] | ||
span_id = span.context.span_id | ||
for evaluation in evals.get_evaluations_by_span_id(span_id) if evals else (): | ||
span_evaluations.append(SpanEvaluation.from_pb_evaluation(evaluation)) | ||
for evaluation in evals.get_document_evaluations_by_span_id(span_id) if evals else (): | ||
document_evaluations.append(DocumentEvaluation.from_pb_evaluation(evaluation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimization: You can place this code of getting evaluations and documents on the Span node. This will alleviate the load of the query if the query doesn't ask for anything related to evals. If there are multiple fields that require evaluations by span_id, you can wrap that in a dataloader to eliminate the n+1. Can do this as a follow-up but it's a worthwhile refactor as it's always good to eliminate over-fetching if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some links to dataloading: https://leebyron.com/dataloader-v2/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yea, that's right. we had talked about this before. I totally forgot about it
Thread( | ||
target=_load_items, | ||
args=(evals, fixture_evals, simulate_streaming), | ||
daemon=True, | ||
).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future thought: I might need to have a "dirty" bit to know when evals change so I know how to refetch due to the lack of subscriptions :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how exactly. let's catch up later
src/phoenix/session/evaluaton.py
Outdated
for index, row in evaluations.iterrows(): | ||
subject_id = _extract_subject_id(cast(Union[str, Tuple[str]], index), index_names) | ||
result = _extract_result(row) | ||
evaluation = pb.Evaluation( | ||
name=evaluation_name, | ||
result=result, | ||
subject_id=subject_id, | ||
) | ||
exporter.export(evaluation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
optimization: this pretty inefficient overall - I get that we are trying to leverage existing code but I think uploading this in bulk / chunks feels much more practical?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, totally agree. I was just trying to limit the scope of this PR. Will definitely upgrade in a future PR
src/phoenix/session/evaluaton.py
Outdated
if index_names and index_names[0].endswith("span_id"): | ||
if len(index_names) == 2 and index_names[1].endswith("document_position"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a bit of magic here that would not be intuitive to the reader. Can you figure out how to maybe leverage variable names and a bit of doc-strings to make this easier to groc? Without being intimately familiar with the structure of the data, I think this will go over a reader's head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. will add docstring
Minor optimization on the PR comment - Might help if you put graphql codeblocks so others can copy / paste and try out the queries for themeselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added sample graphql queries to the PR description
app/schema.graphql
Outdated
score: Float | ||
label: String | ||
explanation: String | ||
spanId: String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I was just copy/pasting what I had in mind for the Python code.
|
||
@strawberry.type | ||
class SpanEvaluation(Evaluation): | ||
span_id: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
src/phoenix/session/evaluaton.py
Outdated
for index, row in evaluations.iterrows(): | ||
subject_id = _extract_subject_id(cast(Union[str, Tuple[str]], index), index_names) | ||
result = _extract_result(row) | ||
evaluation = pb.Evaluation( | ||
name=evaluation_name, | ||
result=result, | ||
subject_id=subject_id, | ||
) | ||
exporter.export(evaluation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, totally agree. I was just trying to limit the scope of this PR. Will definitely upgrade in a future PR
src/phoenix/session/evaluaton.py
Outdated
if index_names and index_names[0].endswith("span_id"): | ||
if len(index_names) == 2 and index_names[1].endswith("document_position"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. will add docstring
app/schema.graphql
Outdated
score: Float | ||
label: String | ||
explanation: String | ||
spanId: String! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. will remove. thx
src/phoenix/server/api/types/Span.py
Outdated
@@ -122,6 +124,14 @@ class Span: | |||
description="Cumulative (completion) token count from self and all " | |||
"descendant spans (children, grandchildren, etc.)", | |||
) | |||
span_evaluations: List[SpanEvaluation] = strawberry.field( | |||
description="Span evaluations", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that makes sense
src/phoenix/server/api/types/Span.py
Outdated
span_evaluations: List[SpanEvaluation] = [] | ||
document_evaluations: List[DocumentEvaluation] = [] | ||
span_id = span.context.span_id | ||
for evaluation in evals.get_evaluations_by_span_id(span_id) if evals else (): | ||
span_evaluations.append(SpanEvaluation.from_pb_evaluation(evaluation)) | ||
for evaluation in evals.get_document_evaluations_by_span_id(span_id) if evals else (): | ||
document_evaluations.append(DocumentEvaluation.from_pb_evaluation(evaluation)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yea, that's right. we had talked about this before. I totally forgot about it
Thread( | ||
target=_load_items, | ||
args=(evals, fixture_evals, simulate_streaming), | ||
daemon=True, | ||
).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how exactly. let's catch up later
label: String | ||
explanation: String | ||
spanId: String! | ||
documentPosition: Int! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/phoenix/session/evaluaton.py
Outdated
@@ -0,0 +1,102 @@ | |||
from typing import Any, Iterable, List, Mapping, Optional, Tuple, Union, cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the name of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very good eye! thanks for the catch!
except Exception: | ||
return Response(status_code=HTTP_422_UNPROCESSABLE_ENTITY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we return a 500 here instead of a 422?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! it's intended for line 31, so this is actually not the right place to put it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking really good @RogerHYang |
Purpose
Changes
Evaluation
Evaluation
to HttpExporterEvaluation
px.core.evals.Evals
(analogous topx.core.traces.Traces
) to store the receivedEvaluation
px.core.evals.Evals
topx.session.session.Session
spans
query in GraphQLllm_classify
Caveats
GraphQL Sample Output
Use the trace fixture
llama_index_rag
.Span Evaluation Names
GraphQL Query
Span Evaluations and Document Evaluations
GraphQL Query